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Suppose that at any stage of a statistical experiment a control variable X that affects the distribution of the 
observed data Y at this stage can be used. The distribution of Y depends on some unknown parameter 6, and 
we consider the problem of testing multiple hypotheses H\ : 6 = 9i, H2 : 9 = 62, ■ ■ ■ , Hk : 9 = 9k allowing 
the data to be controlled by X, in the following sequential context. The experiment starts with assigning a 
value Xi to the control variable and observing Yi as a response. After some analysis, another value X2 for the 
control variable is chosen, and Y2 as a response is observed, etc. It is supposed that the experiment eventually 
stops, and at that moment a final decision in favor of one of the hypotheses Hi, . . . , Hk is to be taken. In this 
article, our aim is to characterize the structure of optimal sequential testing procedures based on data obtained 
from an experiment of this type in the case when the observations Yi,Y2, . ■ ■ ,Y„ are independent, given controls 
Xi, X2, . . . , Xn, n = 1,2, . . . . 
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1 INTRODUCTION. PROBLEM SET-UP. 

Let us suppose that at any stage of a statistical experiment a "control variable" X can be used, that 
affects the distribution of the observed data Y at this stage. "Statistical" means that the distribution of 
Y depends on some unknown parameter 9, and we have the usual goal of statistical analysis: to obtain 
some information about the true value of 9. In this work, we consider the problem of testing multiple 
hypotheses Hi : 6 — 9i, H2 ■ 9 = 92, ■ ■ ■ , Hk ■ 9 = 9k allowing the data to be controlled by A, in the 
following "sequential" context. 

The experiment starts with assigning a value Ai to the control variable and observing Yi as a response. 
After some analysis, we choose another value A2 for the control variable, and observe I2 as a response. 
Analyzing this, we choose A3 for the third stage, get I3, and so on. In this way, we obtain a sequence 
Ai, . . . , Xn, Yi, . . . ,Yn of experimental data, n = 1,2, . . . . It is supposed that the experiment eventually 
stops, and at that moment a final decision in favor of one of Hi, . . . , Hk is to be taken. 

In this article, our aim is to characterize the structure of optimal sequential procedures, based on this 
type of data, for testing the multiple hypotheses Hi, ... , Hk . 

We follow [5| and |TD] in our interpretation of "control variables". For example, in a regression 
experiment, with a dependent variable Y and an independent variable A, the variable A is a control 
variable in our sense, whenever the experimenter can vary its value before the next observation is taken. 
Another classical context for "control variables" in our sense is the experimental design, when one of 
some alternative treatments is assigned to every experimental unit before the experiment starts. The 
randomization, which is frequently used with both these type of "controlled" experiments, can be easily 
incorporated in our theory below as well. 

There exist yet another concept of " control variables" introduced by Haggstrom [2j , and largely used 
in [9] and many subsequent articles (see also 1 for results, closely related to [9], where "control variables" 
are not used). In the context of |9j, a control variable, roughly speaking, is an integer variable whose 
value, at every stage of the experiment, is a prescription of a number of the additional observations to be 



2 



ANDREY NOVIKOV 



taken at the next stage, if any. To some extent, it is related to our control variables as well, because it 
affects the distribution of subsequently observed data. It is very likely that our method will work for this 
type of "sequentially planned" experiments as well, but formally it does not fit our theory below, mainly 
because we do not allow that the cost of observations depend on X. 

In this article, we follow very closely our article [5], where the case of fc = 2 simple hypotheses was 
considered, and use a method based on the same ideas as in [7], where multiple hypothesis testing for 
experiments without control variables was studied. 

For data vectors, let us wr ite, briefly, instead of (Xi, . . . , X„), F^") instead of (Yi, . . . , r„), 

n = 1,2,..., etc. Let us define a (randomized) sequential hypothesis testing procedure as a triplet 
{Xt^t'P) of a a control policy %, a stopping rule ip, and a decision rule (j), with 

X= (xi,X2,...,Xn,...)> "0 = (V'i,'/'2,. ..>«,...): 0= ('/>i,02,. ..>«,...)> 

with the components described below. 
The functions 

X« = X„(x("-i),y("-i)), n=l,2,... 

are supposed to be measurable functions with values in the space of values of the control variable. The 
functions 

V'n = ^n(a:("\y(")), n = l,2,... 
are supposed to be some measurable functions with values in [0, 1]. Finally, 

= {4>nl,4'n2, ■ ■■ , 4>nk), 

with 

are supposed to be measurable non-negative functions such that 

k 

= 1 for any n = 1, 2 . . . . 

1=1 

The interpretation of all these functions is as follows. 

The experiments starts at stage n = 1 applying xi to determine the initial control xi. Using this 
control, the first data yi is observed. 

At any stage n > 1: the value of ipnix^^Ky''"'^) is interpreted as the conditional probability to stop 
and proceed to decision making, given that that we came to that stage and that the observations were 
(yi, 2/2, • ■ ■ , y-n) after the respective controls {xi,X2, ■ ■ ■ , Xn) have been applied. If there is no stop, the 
experiments continues to the next stage {n + 1), defining first the new control value Xn+i by applying 
the control policy: 

Xn+l = X„+l(2:i, ■ ■ ■ ,Xn;yi, ■ ■ • j^n) 

and then taking an additional observation yn+i using control Xn+i- Then the rule ipn+i is applied to 
(xi, . . . , Xn+i', yi, ■ ■ ■ , 2/ji+i) in the same way as as above, etc., until the experiment eventually stops. 

It is supposed that when the experiment stops, a decision to accept one and only one of Hi, . . . , Hk is 
to be made. The function 4',ii{x^"'\y^"'^) is interpreted as the conditional probability to accept Hi, given 
that the experiment stops at stage n being (yi, . . . ,y„) the data vector observed and (xi, . . . ,Xn) the 
respective controls applied. 

The control policy x generates, by the above process, a sequence of random variables Xi, X2, ■ ■ ■ , Xn, 
recursively by 

Xn+l ^ Xn+l{X'^''\Y^^^). 

The stopping rule ip generates, by the above process, a random variable {stopping time) whose distri- 
bution is given by 

Fg^(T^ = n) - E^{1 - ^i)(l - V2) ... (1 - ^n-l)4'n- (1) 

Here, and throughout the paper, we interchangeably use ipn both for 
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and for 

and so do we for any other function 

K=K(x("),y(")). 

This does not cause any problem if we adopt the foUowing agreement: when Fn is under probability or 
expectation sign, it is Fn{X^"\Y^"^), otherwise it is i^„(x^"\ y^"^). 
For a sequential testing procedure (x, "01 4') l^t us define 

oo 

aij{x,ip, 4>) = Pe.( accept iJj) = ^ El^{\ V'„-i)'0n</'ni (2) 

n=l 

and 

/3i(X: "0) 4>) = ( accept any Hj different from 

The probabilities (x, ■0, 0) for j ^ i can be considered "individual" error probabilities and /3i(Xj V': 0) 
"gross" error probability, under hypothesis Hi, i = 1, 2, . . . , fc, of the sequential testing procedure (x, 0, 0)- 
Another important characteristic of a sequential testing procedure is the average sample number. 

iV(.;X,0) = E-r, = l^^^^f^^^^ = P^^^^' = (4) 

I cx) otherwise. 

In this article, we solve the two following problems: 

Problem I. Minimize N{x,ip) = N{di; x,ip) over all sequential testing procedures (x,V'i0) subject to 

ctij (X; 01 0) ^ ctij , for any i ~ 1, . . . fc, and for any j i, (5) 

where atj G (0, 1) (with i, j = 1, . . . /c, j 7^ i) are some constants. 

Problem II. Minimize iV(x, 0) = N{9i; x, 0) over all sequential testing procedures (x, 0, 0) subject to 

/3i(x,0,0) < A, for any i = 1, . . . fc, (6) 

with some constants /Si E (0, 1), i = 1, . . . , fc. 

In Section [21 we reduce the problem of minimizing A^(x,0) under constraints ^ (or ^) to an 
unconstrained minimization problem. The new objective function is the Lagrange-multiplier function 

i(x,0,0)- 

Then, finding 

L(0,0) = inf L(x,'0,0) 

we reduce the problem further to a problem of finding optimal control policy and stopping rule. 

In Section[21 we solve the problem of minimizition of i(x, 0) in a class of control-and-stopping strate- 
gies. 

In Sectional the likelihood ratio structure for optimal strategy is given. 

In Section [5l we apply the results obtained in Sections [2] - 14] to the solution of Problems I and II. 
The final Section [S] contains some additional results, examples and discussion. 

2 REDUCTION TO A PROBLEM OF OPTIMAL CONTROL 
AND STOPPING 

In this section. Problems I and II will be reduced to unconstrained optimization problems using the idea 
of the Lagrange multipliers method. 
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2.1 Reduction to Non-Constrained Minimization in Problems I and II 

The following two theorems are practically Theorem 1 and Theorem 2 in |7j . They reduce Problem I and 
Problem II to respective unconstrained minimization problems, using the idea of the Lagrage multipliers 
method. 

For Problem I, let us define L{x, "0, 4>) as 

L{x,i;,(f>)^N{x,4')+ ^y"y(x,V','^) (7) 

l<ij</c; i^j 

where Ay > are some constant multipliers. 

Let A be a class of sequential testing procedures. 

Theorem 1. Let exist Xij > 0, i — I, . . . , k, j = I, . . . , k, j ^ i, and a testing procedure {x* , ^* , G A 
such that for any other testing procedure (%, G A 

Lix*,r,r)<L{x,^,<i>) (8) 

holds (with L(x, i/'i defined by and such that 

aij{x*,ip*,(f'*) — aij for any i — 1, . . . and for any j ^ i. (9) 
Then for any testing procedure (x, t/", 0) € A such that 

o^ij (Xj "07 ^) ^ Qfy /o?" fl^y i — 1, ■ ■ • fc, and for any j ^ i, (10) 

it holds 

N{x*,r)<N{x,^). (11) 
The inequality in ill]) is strict if at least one of the equalities ilO\) is strict. 

For Problem II, let now L{x, ip, (f) be defined as 

fc 

L(X, ^, 4>) = N{x, 0) + 51 ^MX, 0, 0), (12) 

i=l 

where Xi > are the Lagrange multipliers. 

Theorem 2. Let exist Xi > 0, i — 1, . . . , k, and a testing procedure (x*, "0*, 0*) G A such that for any 
other testing procedure {Xt^i4>) G ^ 

L{x*,r,r)<L{x,iJ,<P) (13) 
holds (with L{xiiPt4') defined by il2\) ). and such that 

P,ix\r,^*)=|3^ for anyi = l,...k. (14) 
Then for any testing procedure (x, ip,(f>) G A such that 

PiiXii^iS) < /3.1 for anyi^l,...k, (15) 

it holds 

N{x\r)<N{x,^). (16) 
The inequality in \lb]) is strict if at least one of the equalities I115\) is strict. 
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2.2 Optimal Decision Rules 

Due to Theorems [T] and [21 Problem I is reduced to minimizing ([7]) and Problem II is reduced to minimizing 
p2)) . But ^2]) is a particular case of namely, when Ay = Xi for any j = 1, . . . , k, j ^ i (see ([2]) and 
Q). Because of that, we will only solve the problem of minimizing L{x,'il',4') defined by ([7|. 
In particular, in this section we find 

miL{x,ip,(l)), 
4> 

and the corresponding decision rule cj), at which this infimum is attained. 
Let Ia be the indicator function of the event A. 

From this time on, we suppose that for any n = 1,2,..., the random variable Y , when a control x is 
applied, has a probabihty "density" function 

fe{y\x) (17) 

(Radon-Nicodym derivative of its distribution) with respect to a tr-finite measure fi on the respective 
space. We are supposing as well that, at any stage n > 1, given control values xi, X2, ■ ■ - Xn applied, the 
observations 1^1,^2, ■ ■ ■ ,Yn are independent, i.e. their joint probability density function, conditionally on 
given controls a:i, a;2, . . . a;„, can be calculated as 

n 

fg{xi, Xn, yi,---,yn) = Y[ fsiVtlXt), (18) 

1=1 

with respect to the product-measure /i"=/i(E)---(X)/iof/xn times by itself. It is easy to see that any 
expectation, which uses a control policy x, can be expressed as 

EfgiY^'-'^) = J <?(y("))/,"''^(y("^)dM"(2/<")), 

where 

71 

i=l 

with 

x. = x.(x(^-'\y^'-')) (19) 

for any i = 1,2, . . . . 

Similarly, for any function Fn = F„(x'^"\ y*^"-') let us define 

F„x(y("))=F„(x("),y(")) 

where defined by (fT9|l . 

As a first step of minimization of L{x, 4', 4'), let us prove the following 

Theorem 3. For any Ay > 0, i = 1, . . . , k, j ^ i, and for any sequential testing procedure (x, 4') 

L(X, V, </>) > ^(X, V) + E / (1 - ) ■ ■ ■ (1 - 4l-i)4lW, (20) 

n=l •' 

where 

l„= min y\jf^.. (21) 

The right-hand side of i20\) is attained if 

'^"^■-^{e.,,a,/.",=^„} (22) 
for any n ~ 1,2, . . . and for any j — 1, . . . k. 
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Proof. Let us suppose that N{xi 4') < oo, otherwise (PO)) is trivial. Then let us prove an equivalent to 
(PO)) inequality: 

A.,a.,(x») > ^ / (1 - . . . (1 - 4'l-M'^lid^,-. (23) 

l<i,j<k;j^i n=l 

The left-hand side of it can be represented as 

CXD „ ^ ( \ 

l<ij<k-j^i r! = l"^ j = l yi<i<fc;i5^j J 

(see 0). 

Applying Lemma 1 7] to each summand on the right-hand side of ()24p we immediately have: 

oo „ 

A,,a,,(x, V, 0) > E / (1 - ■ • ■ (1 - ^^-JV-^^^dM" (25) 

^ 7 . . „■ ^ 1 I' 



l<i,j<k; j^i 

with an equality if 



for any n = 1, 2, . . . and for any 1 < j < k. □ 
Remark 1. It is easy to see, using ^ and i25\) . that 

L(X, V^) - inf L(x, 7^, 0) = ^ / (1 - ^?) ... (1 - V^JV-^ (n/,";^ d/i", (26) 

n— 1 

uijf/i Z„ defined by i21\) . if Pg^^r^ < oo) — 1, and L(x,i/') = c» otherwise. 

Problem I is reduced now to the problem of finding strategies (x, ■0) which minimize i(Xj "0)- Indeed, 
if there is a (x*j '0*) such that 

L(X*,0*)= inf i(x,^), 
(x,V') 

then for any <p* satisfying 

(see (HH)), by Theorem [3] for any (x, 0, 0) 

L(x*,V*,0*)-i(x*,^*) <i(x,^) = i(x,^,r), 

thus, the conditions of Theorem [1] are fulfilled with = aij(x*, i^* ,4'*) for *ii = 1, ■ . . , fc, * 7^ j- 
Because of this, in what follows we solve the problem of minimizing L(x,i/')- 
Let us denote, for the rest of this article, 

= (l-0i)...(l-0„_i)V„ and c^C = (1- Vi)...(l-V'„-i) 

for any n = 1, 2, . . . (being s'f = 0i and = 1). Respectively, 

st''' = (1 - ^f) ... (1 - i^l.Ml and ct'^ = (1 - ^f) ... (1 - 0^_i) 

for any n — 1,2, . . . (being sf'^ = ipi and cf'^ = 1 as well). 
Let also 

Ct'"" = : (1 - (2/(^))) ... (1 - V'^i(y<"-'^)) > 0}, 

for any n>2, and let Cf'^ be the space of all y^^\ and finally let 

Ct''' = : (1 - ^?(y(^')) ... (1 - V^(y("))) > 0}, 

for any n > 1. 
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3 OPTIMAL CONTROL AND STOPPING 

In this section, the problem of finding strategies {x,ip) minimizing L{x,il') (see (HH)) will be solved. 
3.1 Truncated Stopping rules 

In this section, we solve, as an intermediate step, the problem of minimization of i(x, "0) over all strategies 
with truncated stopping rules, i.e. such ip that 

1p = {lpl,1p2,---,1pN-l,l,---)- (27) 

Let be the class of stopping rules ip of type ^7\ . where N is any integer, N >2. 
The following Theorem can be proved in the same way as Theorem 4.2 in [6]. 

Theorem 4. Let ip G be any (truncated) stopping rule, and x cli^U control policy. Then for any 
1 < r < N ~ 1 the following inequalities hold true 



Lix, ^)>i2 f 4'Hnf-;^ + l^,)d^" + I c% ((r + 1)/;+^^'^ + T/^f ) d^,^+^ (28) 

n=l 



> E / ^'''^C^/;;" + ll)d^^'' + / ''^ + V^'^'^) d^^\ (29) 

n=l •' 

where = In , and recursively for n — N, N — 1, ... 2 

= min{/„_i, + R^_,}, (30) 

with 

<-i=<-i(^^""'^J/^""'^)-niin / F„^(xi,...,x„;yi,...,y„)dMy„). (31) 



The lower bound in \29fl is attained if and only if 
ji^-almost everywhere on C^'-^ and 

R^^HV^''^) = I dM(j;n+i) (33) 



-almost everywhere on C^'^ , for any n = r, . . . , N — 1. 

Remark 2. It is supposed in Theorem^ and in what follows in this article, that all the functions Rn-i 
defined by I131\) are well-defined and measurable, for any n — 2, . . . , N , and for any N — 1,2, ... . 

The following Corollary characterizes optimal strategies with truncated stopping rules. It immediately 
follows from Theorem 2] applied for r = 1. 

Corollary 1. For any truncated stopping rule ip G , and for any control rule x 

L(x,V)>l + <, (34) 

where 

< = min / Fi^(xi;2/i)dM(yi)- (35) 



The lower bound in ^34\ l is attained if and only if \3'2\) is satisfied ^"-almost everywhere on C^'^ and 
is satisfied -almost everywhere on C^'^, for any n = 1,2, . . . , N — 1 and, additionally. 



/ v^i'^(xi;yi)rfMyi)- (36) 
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Remark 3. It is obvious that the testing procedure attaining the lower bound in is optimal among 
all truncated testing procedures with ij} € A"'^. But it only makes practical sense if 



In — mm \ii > 1 + _Rn . 

The reason is that Iq can be considered as "the L{x,fl^)" function for a trivial sequential testing 
procedure (xo, V'Oj '/'o) which, without taking any observations, applies any decision rule (j)Q such that 
4"3j ^ ^{Z]^ Aij='o} f^'"' '^'^y J — \,...,k. In this case there are no observations (N^O'^ipo) — 0), xa is 
nothing, and it is easily seen that 

k 

■^(XcV^o, 0o) ^EE'^y'^OJ " 

Thus, the inequality 

lo<l + Ro 

means that the trivial testing procedure (xoj V'Oi 0o) is not worse than the best testing procedure with ip 
from . 

Because of this, we may think that 

^o"^ = mm{/o,l + <} 

is the minimum value of L(Xt^) when taking no observations is permitted. It is obvious that this is a 
particular case of 1^) with n = 1, if we define fS = 1. 



3.2 General Stopping Rules 

In this section we characterize the structure of general sequential testing procedures minimizing L{x, ijj). 
Let us define for any stopping rule ip and any control policy x 

N-l „ „ 

Ln{x, V') = E / 4'Hnf^;'' + W + / 4'^^ (^/il''^ + rfM^- (37) 

n=l '' •' 

This is the Lagrange-multiplier function corresponding to ip truncated at N, i.e. the rule with the 
components = (V'1,'02, • • .,ipN~i, 1, • ■ ■): Ln{Xi'4') = L{x,i>^)- 

Since is truncated, the results of the preceding section apply, in particular, the inequalities of 
Theorem |4l 

The idea of what follows is to make N ^ oo, to obtain some lower bounds for L{x,ip') from - 
Obviously, we need that Lpf{x, ip) ^ L{x,ijj) &s N ~^ oo. A manner to guarantee this is using the 
following definition. 

Let us denote by ^ the set of all strategies (x, ip) such that 



lim El (1 - ^/>i) ... (1 - V-n) = for any i = l,2,...,k. (38) 

n — 'OO ^ 

It is easy to see that (1551) is equivalent to 

(r^ < oo) — 1 for any i = 1,2, . . . ,k 

(see 

Lemma 1. For any strategy (XjV') G =^ 

lim Ln{x,-iP) = Hx,i^)- 

N—^QO 
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Proof. Practically coincides with that of Lemma 5.1 in (with fg instead of ), except that in order 
to show the convergence 

' ^'^^^dA*^ ^ 0, N- 



we use the following estimate: 

/ 4'^^/^^/.^ < max A,, J2 I ^''fe-V = max A,, ^ El c% ^ (39) 

2 — 1 1=1 

as N —> oo, because of □ 

The second fact we need is about the behaviour of the functions which participate in the inequalities 
of Theorem 21 as iV ^ oo . 

Lemma 2. For any n> 1 and for any N > n 

> (40) 

Proof. Completely analogous to the proof of Lemma 5.2 [6] (with fg_^ instead of f^^)- □ 
It follows from Lemma [2] that for any fixed n > 1 the SGquGncG \\f^ is non-incrca.sing. So, there exists 

K = lim l-„~. (41) 

N^oo 

Now, passing to the limit, as iV ^ oo, in P5|) and (|^^ with tp — tp'^ , we have the following Theorem. 
The left-hand side of P5|) tends to L{x,tp) by Lemma [TJ Passing to the limit on the right hand side of 
and in (|29p is possible by Lebesgue's monotone convergence theorem, by virtue of Lemma O 



Theorem 5. Let (x, V') (z be any control- stopping strategy. Then for any r > 1 the following inequal- 
ities hold 

L{x, ^) > E / 4'Hnf;;^ + IDdy^^^ + I c% ((r + l)fl+'-^ + V?^,) dt^^^' (42) 

n=l •' •' 

> E / 't'^i^fo;'' + l^Jd^," + I ct^ {rf^.f + F/) (43) 

n=l •' •' 

Vr^min{lr,fl^+Rr}, (44) 



where 
being 



Rr = i?.(a:W,yM) = min / Vr+l{x^''+'\y(^+'^)d^i{yr+l). (45) 
In particular, for r = 1, the following lower bound holds true: 

Lix, V') > 1 + / V,''d^l{y,) > 1 + i?o, (46) 



where, by definition, 

Ro = min / Vi{xi,yi)dfi{yi). 



Exactly as in [6] (see Lemma 5.4 [6l) it can be proved that the right-hand side of (l46l) coincides with 

inf L(x,'0)- 

In fact, this is true for any ^ such that (x, ip) £ ."^ implies Ln{x, "0) ~^ ^(Xj tp) as N ^ oo. 
The following theorem characterizes the structure of optimal strategies. 
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Theorem 6. // there is a strategy {XtiP) G such that 

L{xA) = ^ ,inf ^i(x'>'), (47) 

then 

^{'^ - '^r < hlr<re[''+m] (48) 

jJ' -almost everywhere on Cf'^, and 

V4i(2/(^+i))dMyr+i) = i?? (49) 

jj' -almost everywhere on Cf'^, for any r = 1,2 ... , where Xi *s defined in such a way that 

J V^dtiivi) = i?o. (50) 

On the other hand, if a strategy {ip,x) satisfies |^<$[ ) jjT -almost everywhere on Cf'^, and satisfies |^ff[ ) 
jjJ' -almost everywhere on Cf'^, for any r = 1,2 . . . , where xi is such that 1150]) is fulfilled, and x) G 
then 14 7] ) holds. 

Proof. Almost literally coincides with the proof of Theorem 5.5 [5] (substituting fg^ by fg^), with the 
omission of the proof that {ip,x) G ^ in the "if'-part (see (76) and (77) in [3]), because now it is a 
condition of Theorem [6] 

□ 

Remark 4. Theorem\^ treats the optimality among strategies which take at least one observation. If we 
allow to take no observations, there is a possibility that the trivial testing procedure (see Remark\^ gives 
a better result. It is easy to see that this happens if and only if 

lo<l + Ro. 

4 LIKELIHOOD RATIO STRUCTURE OF OPTIMAL STRAT- 
EGY 

In this section, we will give to the optimal strategy in Theorem [5] an equivalent form related to the 
likelihood ratio process, supposing that all the distributions given by fg. are absolutely continuous with 
respect to that given by fg-^ . More precisely, we will suppose that for any x 

{y ■■ fe,{y\^) = 0} c fjiy : foM^) = 0}. (51) 

1>1 

Let us start with defining the likelihood ratios: 

and let Z„ (Z^,. .. ,Z^). 

Let us introduce then the following sequence of functions pr = Pr{z), r = 0,1,..., where z = 

{Z2, . . . Zk). 

Let 

poiz) = g{z) EE T[anS^ \ijZi, (52) 
where, by definition, zi = 1. Let for r — 1,2,3,..., recursively, 

p^(z) =min(5r(z),l + min /" /e,(y|x)pr-i ( ^2^^^^^, • ■ • , Zfc^^44^ ) rfA'(2/)| (53) 
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(we are supposing that all pr, r — 0,1,2, . . . are well-defined and measurable functions of z). It is easy 
to see that (see (PT|) ) 

and for r iV - 1,7V - 2,. . . ,1 

Vr"" = re,PN-r{Zr). (54) 

It is not difficult to see (very much like in Lemma [J) that 

Pr(z) > Pr + l{z) 

for any r = 0, 1, 2, . . . , so there exists 

p{z) = lim pn{z). (55) 

n — 'oo 

Using arguments similar to those used for obtaining Theorem [5l it can be shown, starting from (|53p . that 

p(z) = min{.g(z),l + i?(z)}, (56) 

where 

i?(z) = min / f,Jy\x)p(z,lf^ zk^p^)df,{y). (57) 

Let us pass now to the limit, as ^ cx), in We see that 

Vk^f^^p{Zk). 

Using this expression in Theorem [6] we get 

Theorem 7. // there exists a strategy (x, ip) & ^ such that 

L(X,V)-, ,inf Ux!,^/), (58) 

then 

I{g{Z^)<l+R{Zf)} < -0? < I{g{Z^)<l+R{Z^)} (59) 

-almost sure on 

{yW : (1 - (2/^'))) ... (1 - V'?-i(y''-'))) > 0}, (60) 



feAy\Xr+i)P Z^'^—— Z^^^—— dfi{y) = R{Z^) 

V fei{y\Xr+i) feAy\Xr+i)J 



(61) 



-almost sure on 

{yW : (1 - V?(y^'^)) ... (1 - ^^ii"-^)) > 0}, (62) 
where xi is defined in such a way that 

f I \ \ f fe2iy\Xl) feJy\Xl)\ , , X r,,.. ,r-i\ 

fei{y\xi)p "T-n — N ' ■ ■ ■ ' "T-n — r '^i^^v) ^ ^W- (^3) 



/ei(2/lxi)'"'' foiiylxi 

On the other hand, if (XiV') satisfies i59\) P^^-almost sure on i6U\) and satisfies i61]) Pg^-almost 
for any r — 1,2,..., where Xi satisfies I163\) . and (x, ip) G then (x, Tp) satisfies i58\) . 



12 



ANDREY NOVIKOV 



5 APPLICATION TO THE CONDITIONAL PROBLEMS 

In this section, we apply the resuhs obtained in the preceding sections to minimizing the average sample 
size N{x,4') = Elg_^T^ over all sequential testing procedures with error probabilities not exceeding some 
prescribed levels (see Problems I and II in Section [T|). 

Combining Theorems [1] [3] and [51 we immediately have the following solution to Problem I. 

Theorem 8. Let {x,ip) G satisfy the conditions of Theorem\^ with Xij > 0, i, j — I, . . . , k, i ^ j 
(recall that In, Vn, and Rn are functions of Xij), and let (p be any decision rule satisfying i22\}. 
Then for any sequential testing procedure (%', ip' , 4>') g ^ such that 

ay(x'>'>') < for any i,j^l,...,k,iy^j, (64) 

it holds 

Nix',ij')>N{x,ij). (65) 
The inequality in I165\) is strict if at least one of the inequalities in is strict. 

If there are equalities in all of the inequalities in |ff^[ ) and I165\) . then (x',?/'') satisfies the condition of 
Theorem\^as well (with x' instead of x ^.i^-d tp' instead ofip). 

Proof. The only thing to be proved is the last assertion. 
Let us suppose that 

au (x','0'>') = for any i, j ^ 1, . . . , k, i ^ j, 

and 

7V(x',V') = ^(x,V')- 

Then, obviously, 

L(X, V', </-) = Lix, V) = L{x', (/.') > L{x\ ^') (66) 

(see (III)) and Remark [H 

By Theorem[Sl there can not be strict inequality in the last inequality in (|66p. so L{x, ^) = L{x', i/"')- 
From Theorem [51 it follows now that (%', satisfies the conditions of Theorem [51 as well. □ 

Analogously, combining Theorems [21 [3 and [51 we also have the following solution to Problem II. 

Theorem 9. Let (x, ip) (z ^ satisfy the conditions of Theorem\^ with Xij — Xi > for any i — 1, ... A: 
and for any j — 1, . . . ,k, and let (f> be any decision rule such that 

- ^{E.^, a./." =min, E.^, A./,"J 

for any j — 1, . . . , k and for any n ~ 1,2, . . . . 

Then for any sequential testing procedure (%', ip' , (f>') € such that 

A(x',V'',</'') < ft(x> V'>'/') for any i^l,...,k, (67) 

it holds 

N{x',i^')>N{x,^). (68) 



The inequality in W^) is strict if at least one of the inequalities in |ff7p is strict. 

If there are equalities in all of the inequalities in {61^ and i69j) . then (x'lV^') satisfies the conditions 
of Theorem\^ with Xij — Xi, i, j = 1, . . . , k, i ^ j , as well (with x' instead of x o,nd ^' instead of ). 
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6 ADDITIONAL RESULTS, EXAMPLES AND DISCUSSION 

6.1 Some general remarks 

Remark 5. The class defined by I138\) can be extended in such a way that Theorem\^ remains valid. 
It can be defined as the class of all the strategies (x, ip) for which 



lim £;,^.(l-Vi)...(l-^„) = (69) 

n — ^oo ' 



for at least k — 1 different values of 9i . To see this it is sufficient to notice that for any strategy in this 
extended class 

LNix^i^) -> L{x,^p), as N ^ oo, 
because ( see the proof of Lemma QJ) 



if j corresponds to 6j for which i fgP|] does not hold. 

Obviously, Theorem\E[ remains valid with this extension of ^ . 

Moreover, in the same way, Theorem\d[ remains valid if ^ is defined as the class of all strategies (Xj "0) 
for which 

Ln{x,iP) ^ Hx,i'), N ^ oo. 
But the statistical meaning of this class is not clear, so we prefer for .'P one of the definitions above. 

Remark 6. In the same way as in the preceding sections, a more general problem than just minimizing 
N(9i;Xi''P) can be treated (see 0) and Problems I and II thereafter). 

Namely, we can minimize any convex combination of the average sample numbers, or 

k 

1=1 

where Ci > 0, i — 1, . . . , k, are arbitrary but fixed constants. More exactly, if we modify the definition of 
the functions Vj^ in iSO]) to 

k 

K^^i = min{;,_i, ^ cj^'-i + (70) 

i=l 

for r = N, . . . ,2, being, as before, 

Vr = lim V^, 

N^oo 

and, respectively, change \4-^ Theorem to 

then Theorem\^ remains valid. Theorems^^\^ and\^ can be modified respectively. 

6.2 An example 

In this Section we show how our resuhs can be apphed to a concrete statistical model. 

Let us suppose that any stage of our experiment is a regression experiment with a normal response. 
More specifically, we are supposing that the distribution of Y , given a value of the control variable X , is 
normal with mean value OX and a know variance cr^, say — 1. 

Thus, 

/e(y|x) = ^exp(~fc:^l, -cxX y < oo (72) 
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For simplicity, let us take k = 2 simple hypotheses, for example. Hi : 9 — 1 and H2 9 — 2, and 
suppose that the control variable takes only two values, say, x = 1 and x = 2. 
Condition (jSip is fulfilled in an obvious way. 

Let A12 > and A21 > two arbitrary constants. We start defining 



(see 

Next, we calculate 



Pq{z) = g[z) = min{Ai2, A212;}, 



— e:xjp{xy — 3x /2}, 



and 

/5n+i(2) = mm{5(z), 1 + mm / /9„(zexp{x?/ - 3a; /2|) = dy. 



for n = 0,1,2,... (see (pl) ). 
Let p{z) = M-mn^ 00 Pn{z), and 

R{z) = min r p{z exp{a;y - 3x^2}) ''M-iy-xY/'^} 
Now, by Theorem [71 an optimal strategy will be defined on the basis of the likelihood ratio process 

n 

Zr, = exp{^(X,r, - 3Xf/2)}, 

being the optimal stopping time t — min{n : g{Zn) < 1 + -R(2„)}, whereas at each stage n = 1, 2, . . . 
the next control value Xn+i = x (a; = 1 or a; = 2) is defined in such a way that 

R{Z^)^ r p(Z„exp{a;y-3a;V2})^^PH^=^^dz/, 

starting from Xi defined as a; (x = 1 or a; = 2) for which 

R{1) = r p(exp{xy-3a:V2})^^^^i^^^=^dy. 
J-00 V27r 

When the test terminates at some stage r = n, we should reject Hi, if A2i.^„ > A12, and accept Hi 
otherwise (see Theorem H]). 

One can vary the error probability levels of this test by changing the values of A12 and A21. 



6.3 Bayesian testing of multiple hypotheses 

In this section we characterize the structure of Bayesian multiple hypothesis tests. 

Let TTi > 0, i = 1, . . . , /c be prior probabilities of Hi, i = 1, . . . , fc, respectively, ~ 
let Wij > 0, i,j = 1, . . . ,fc, be some losses due to incorrect decisions (we assume that wu = for any 
i — 1, . . . , k). Then, for any sequential testing procedure (x, ip, 4>), we define the Bayes risk as 

fc / fc \ 

r{x, -0, = X! cE^J->P + '^^J'^vix, -0, 0) , (73) 
«=i \ j=i / 

where c > is some unitary observation cost (cf. Section 9.4 of [12], see also Chapter 5 of ^ for a more 
general sequential Bayesian decision theory, both monographs treating non-controlled experiments). Let 
us call Bayesian any testing procedure (x,i/'j0) minimizing (|73p . 

In this section, we show that the Bayesian testing procedures always exist, and characterize the 
structure of both truncated and non-truncated Bayesian testing procedures for the controlled experiments. 
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To formulate our results, we use the notation of Sections [T]- [51 but we have to re-define some elements 
have been defined therein. 

First of all, it is easy to see from Theorem [3] that the optimal decision rule has the following form. 
Let 

k 

ln= min y^TTiWijfg (74) 

l<j<k ^ — ^ 
i—1 

(cf. (HI])). Then the decision rule (f) is optimal (inf^/ r(x, ^p, cj)') — r(x, 4>') for any x and ip) if 

<t>no<I{^>^^^^^^^^^p^^=l^} (75) 

for any j — \, . . . and for any n — 1,2, . . . (see Theorem[3]). 

Let n be the prior distribution defined by tt^, i = 1, . . . , fc, and let, by definition. 



for any n — \,2, . . . . 

For any = 1, 2, . . . let us define 

= In, (76) 

and for any n = N — 1, N — 2, . . . ,1, recursively, 

=mm{l„,cf^ + R,,}, (77) 

where 

< = <(x("),j/("))-min / V^_,,{xu...,Xn+i;yi,...,yn+Myn+i). (78) 

Let also 

< = min / V^i^(.Ti; yi)d/i(yi). (79) 

The following Theorem characterizes Bayesian procedures with truncated stopping rules and can be 
proved in exactly the same way as Corollary [1] 

Theorem 10. Let x be any control policy, "0 € A''^ be any (truncated) stopping rule and (f> any decision 
rule satisfying 1^75^ for any j = 1, . . . , k and for any n = 1,2,.... Then 



r(x,V»>c + <. (80) 

There is an equality in i8U\) if and only if 

hli<cfn-''+R^-''} - ^" - hii<cfs-''+R"-''} ^^^^ 
fi^-almost everywhere on C^'^ and 

R:!'Hy^''^) = / F„^?(2/("+i))dA.(y„+i) (82) 

fi^-almost everywhere on C^'^ , for any n = 1, . . . , N — 1, and, additionally, 

I V.'^ixi; yMyi). (83) 
Let now Vn = limAr_+oo , n — 1,2,.... Respectively, i?„ = limjv^oo -R^, n = 0, 1, 2, . . . . 
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Theorem 11. Let x be any control policy, ip any stopping rule, and (j) any decision rule satisfying \75^ 
for any j — 1, . . . , k and for any n — 1,2, ... . Then 

r(x,^,<^)>c + i?o. (84) 

There is an equality in ^84\ l if and only if 

I{ll < cf-'^ + R^} <^l< I{ii<c}^-^+Ri^ (85) 
fx^-almost everywhere on C^'^ and 

Rlii""^) = j v;'Vi(y<"+'^)rf/i(yn+i) (86) 

jjj'^ -almost everywhere on C^'^, for any n = 1,2 . . . , and, additionally, 

Ro= f Viixi;yMyi). (87) 



Proof. First of all we need to prove that (|84p holds for any strategy (x,V')- Obviously, it suffices to 
prove this only for such (XjV') that r(x,ip,(j)) < oo. But this latter fact implies, in particular, that 
J2i=i ^iEg.T^ < oo (see ([73|'). Because tt^ > for any i = 1, . . .k, it follows that (x, ^) satisfies (p8)) . so 

riXi'ip^ ,<l>) ^ r{x,ip,<l>), N oo, 

where , by definition, is ("01, 4'2, ■ ■ ■ , 4'N-i, 1, • • • ) (see the proof of Lemma [1}. 

The rest of the proof of the "only if -part is completely analogous to the corresponding part of the 
proof of Theorem [S] (or Theorem 5.5 [B]). 

To prove the "if -part, first it can be shown, analogously to the proof of Theorem 5.5 [6 , that 

E / 4'Hcnf^''' + li)dt^'' + J cti\ (c(r + l)f^+''^ + V^^,) d^^+' =c + Ro, 



for any r = 0, 1, 2, ... , if (V', x) satisfies ^ - (|87l) . 
Because c > 0, we have from ([SS]) . in particular, that 



Y.7^,P,^S^^>r + l)=^ctl\f^+'^''d^,^+'<-^^^^ as r ^ oo. 

Because tt^ > for alH = 1, . . . , k, this implies that for (x, tp) ([38|) is fulfilled. It follows from ([88]) now 
that 

hm J2 I st^'icnf^'^' + l^Ml^'' = r{x,^,^)<c + Ro. 

n=l 

Along with ifM]) this gives that r(x, fp,(p) = c + Rq, i.e. there is an equahty in ([84|) . □ 



6.4 Experiments without control 

In this section we draw consequences for statistical experiments without control. 

Let us suppose that the density of Y given X does not depend on X: fe{y\x) = feiy) for any y and for 
any 9, meaning that there is no way to control the flow of the experiment, and the observations Yi, Y2, . . . 
are independent and identically distributed (i.i.d.) random "variables" with probability "density" function 
feiy)- We can incorporate this particular case in the above scheme of controlled experiments thinking 
that there is some (fictitious) unique value of control variable at each stage of the experiment, thus, being 
any control policy trivial. 
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Because of this, any (sequential) testing procedure has in effect only two components in this case: a 
stopping rule ip and a decision rule (j). So we use the notation of section [^751 simply omitting any mention 
of the control policy. For example, for any testing procedure (-0, 0) the Bayesian risk (|73p is now: 

fe / fc \ 

(V,0) . (89) 

Respectively, /g" = /^"(y^")) = Yl7=i feiVi) in (El) now, and the functions , , K, Rn, etc. of the 
preceding section are all functions of y^"-* only. 
Theorem [TT] of section 16.31 transforms now to 



Theorem 12. Let tp be any stopping rule and (j) any decision rule satisfying |75p for any j — 1, . . . ,k 
and for any n — 1,2,.... Then 

r(^,0) > c + i?o. (90) 

There is an equality in i9(J\) if and only if 

-f{!„<c/5+i?,.} <tp^ < I{i^<cf^+R„} (91) 
li" -almost everywhere on for any n ^ 1,2, . . . , where 

Rn = Rn{yi, ■■■,yn) = J Vn+i{yi, . . . , y„+i )d^(y„+i ) , 

being, for any n — 1,2, ... , Kt(?/*'"-') = limAr^oo V,^{y^"'^), where = In, and 

Kf(y("))^min{;„(y(")),c/S(y(")) + J K^i(y^"+'^)dM(yn+i)} 

for any n = N -1,...,1, N = 1,2,... 

In particular, this Theorem gives all solutions to the problem of Bayesian testing of multiple simple 
hypotheses for independent and identically distributed observations when the cost of observations is linear 
(see Section 9.4 of [12 and suppose that K(Xi, . . . , Xn) = n therein). 

In the particular case of two hypotheses (/c = 2) a Bayesian test of Theorem [T2l given by 

V'r^ = ■^{i„<c/5 + fl„}, n=l,2,..., 

has the form of the Sequential Probability Ratio Test (SPRT, see [H]), being all other Bayesian tests 
([OT]) randomizations at its boudaries (see |8j for closely related results). 
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